home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
kermit.columbia.edu
/
kermit.columbia.edu.tar
/
kermit.columbia.edu
/
newsgroups
/
misc.19980424-19980901
/
000416_news@newsmaster….columbia.edu _Wed Aug 26 10:19:32 1998.msg
< prev
next >
Wrap
Internet Message Format
|
1998-08-31
|
6KB
Return-Path: <news@newsmaster.cc.columbia.edu>
Received: from newsmaster.cc.columbia.edu (newsmaster.cc.columbia.edu [128.59.35.30])
by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id KAA15001
for <kermit.misc@watsun.cc.columbia.edu>; Wed, 26 Aug 1998 10:19:31 -0400 (EDT)
Received: (from news@localhost)
by newsmaster.cc.columbia.edu (8.8.5/8.8.5) id KAA03002
for kermit.misc@watsun; Wed, 26 Aug 1998 10:19:31 -0400 (EDT)
Path: news.columbia.edu!panix!howland.erols.net!news.idt.net!psinntp!pubxfer.news.psi.net!usenet
From: "Bob Kennedy" <bkennedy@peco-energy.com>
Newsgroups: comp.protocols.kermit.misc
Subject: How I Implemented an application using C-Kermit fr VMS
Date: Wed, 26 Aug 1998 10:13:20 -0400
Organization: PSINet
Lines: 89
Message-ID: <6s156a$s0k$1@client3.news.psi.net>
NNTP-Posting-Host: 159.214.60.222
X-Newsreader: Microsoft Outlook Express 4.72.2106.4
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.2106.4
Xref: news.columbia.edu comp.protocols.kermit.misc:9142
Plant Monitoring System
System Health Monitor
Written By: Robert S. Kennedy
Date: August 14, 1998
I work for a utility company, and one of my many functions here is to
maintain a Plant Monitoring System (PMS).
This PMS is a complex, redundant system. It uses a high-speed,
intelligent, distributed Data Acquisition System (DAS) to collect and
produce plant data, and transmit this data to host processors. The data is
manipulated and validated by application programs, and then presented to
operations personnel and other pertinent users on video terminals, plotters,
and printers. The key to configuring, controlling, and maintaining this
complex configuration is knowledge of the hardware, peripheral components,
applications, and the software design and implementation.
The PMS runs on two VAX 4000-600 host computers per unit, on ethernet
nodes. There are 24 additional 4000-90a micro-vax workstations. We use VMS
ver. 6.1 on our 4 main hosts. Because of the major complexity of this
system, it can fail in many ways; Disk Drive space is usually at a premium;
printer queues consistently fail, or stop; or the DAS shuts-down. Any one
of these failures is highly visible to the users of this system, therefore
it is important to resolve these failures in a timely manner. In the past,
we had a pager for which the operations department could call in the event
of a computer emergency. We would respond as quickly as we could to resolve
the problem, but many times the effort was too late. The need to respond
more quickly or to look for emerging issues was becoming more and more at
hand.
After a brainstorming session with other individuals from my group, we
came up with a solution. This solution was that a DCL Command procedure
could run once an hour and check various items in the system, like disk
space, stopped queues, or if the DAS were down. If there existed any
problem on any hour, any one of the four host nodes could call us on our
digital pager with some form of numeric code describing the problem. That
solution worked, and worked well, until we started adding more functions to
this "System Health Checker". The more we added, the more codes were
needed, and pretty soon, we needed to carry around a "cheat sheet" with a
description of all the codes. What we needed was one of those new "fancy
dancy" alphanumeric pagers, so the system could call us and tell us in
"English", not numbers, what has gone wrong.
While 'surfing' the Internet one day, when I happened upon a website,
'www.columbia.edu' and found a 'Kermit' page there. I was familiar with the
use of "Kermit" through our VAX/VMS systems, and started reading about the
new release of "C-Kermit". Being a programmer who knows how to read and
program in "C", I became more curious. I downloaded the newest version of
"C-Kermit" for our VAX/VMS system and installed it, and found that it ran
perfectly and without a hitch. As I went on to read the miscellaneous
documents, I found that this version of "C-Kermit" was not freeware, but
shareware and that I should purchase the book as payment for the program.
After receiving the book, I breezed through it and found a section on
"Calling an Alphanumeric Pager". This was just what the "doctor ordered".
After talking to the people whom we use for our pagers, I was able to
receive all the necessary documents on how to use their Telocator
Alphanumeric Protocol (TAP). Using, as a guide, the example macro in the
C-Kermit book on "sending a one-line alpha page using TAP", I was able to
send the alpha messages to our newly acquired on-call alpha pager. Loaded
with this new technology, I was able to apply this to our "Health Checker".
Our "Health Checker" now performs approximately 17 separate functions on
an hourly basis per host node. The system monitors Disk Space; Queues; the
DAS; other systems attached to our PMS through a serial link; miscellaneous
applications; as well as monitoring itself. One of the four nodes is
responsible for checking the other three to see if the "Health Checker" is
running in the "SYS$BATCH" queue or not. Each of these separate sub-modules
has the ability to call the alpha pager with a message describing the
problem. This is done by a system-wide logical pointing to a DCL command
procedure, that basically runs "C-Kermit" with script code using the TAP.
Once an hour all four nodes check their respective functions, and twice a
day, the system will page us with a message that each of these four nodes is
"AOK".
C-Kermit opened up the door for us to perform this proactive search for
problems and a means by which to contact us in the event of an emergency
with a well-defined "English" message. This is most effective around 03:30
in the morning when the system calls to tell us that DAS is down, or any
other type of problem. If we were still using that "Digital pager", we
would need to find, then lookup the code describing the problem. This, for
most of us, is difficult at three in the morning.